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ABSTRACT 

In    1957   L.    R.    Ford,    Jr.,    developed   a  procedure    that  would 
produce    a  rank-order  of   objects    from  subjective    judgments. 
Standard  procedures    usually    require    that   the   number  of    com- 
parisons   between    any    given   pair   of   objects   be   equal   to  the 
number  between    any   other  pair.      This   method  does   not    require 
any   specific  number   of   comparisons   between   pairs,    and  it 
allows    that   there   be   missing   data.      A   computer  program  was 
developed   utilizing   Ford's    technique.      This    study    adapted 
the   program    for  use    on   the    IBM   36  0/6  7    and  evaluated  the    va- 
lidity  of   the   program  and  model  which    appeared   good.      Applica- 
tions   for   the    use    of   such   a  program  in    the   Navy  v/ere    cited. 


TABLE  OF  CONTENTS 

I.   INTRODUCTION   6 

II.   METHOD   10 

A.  TEST  CATEGORIES   10 

B.  SUBJECTS  11 

C.  PROCEDURE 11 

D.  DESIGN       11 

E.  COMPUTER  PROGRAM   12 

1.  Main  Program   12 

2.  Subroutine  C0RE2   13 

3.  Subroutine  C0RE3   14 

III.   RESULTS   15 

IV.   DISCUSSION   22 

V.   APPLICATIONS  FOR  USE  OF  PROGRAM  IN  U.S.  NAVY   24 

A.  SPECIFIC  USES   2  4 

B.  AN  EXAMPLE   25 

VI.   CONCLUSIONS   26 

APPENDIX  A:   Instructions  to  Judges  -  Testing  Phase  2  7 

APPENDIX  B:   Accepted  Standard  Rank  Ordering  of 

Items  by  Category   

APPENDIX  C:   Computer  Rank  Ordering  of  Items  by 

Category   ^^ 

APPENDIX  D:   Instructions  to  Judges  -  Application 

Example   30 

APPENDIX  E:   Computer  Rank  Ordering  of  Research 

Projects  Application  Example   31 

APPENDIX  F:   Evaluation  of  Items  per  Category  by 

Ten  Judges   ^' 


APPF:jdix  G:   Evaluation  of  Abstracts  by  Ten  Judges 39 

COMPUTER  PROGRAM   40 

BIBLIOGRAPHY   46 

INITIAL  DISTRIBUTION  LIST   48 

FORM  DD  1473   49 


LIST  OF  FIGURES 

Figure 

1.  Weight  Change  Per  Iteration  -  CATEGORY  I   18 

2.  Weight  Change  Per  Iteration  -  CATEGORY  II ^^ 

3.  Weight  Change  Per  Iteration  -  CATEGORY  III 20 

4.  Weight  Change  Per  Iteration  -  CATEGORY  IV 21 


I.   INTRODUCTION 

Given,  a  number  of  objects  to  be  considered  according  to 
the  different  degrees  in  which  they  exhibit  some  common  qual- 
ity.  If  the  quality  is  measurable  in  some  objective  way,  the 
problem  is  amenable  to  treatment  by  well  understood  methods. 
It  may  happen,  however,  either  for  theoretical  or  practical 
reasons,  that  the  quality  is  not  measurable,  or  there  is 
little  intuitive  feeling  as  to  what  form  the  distribution  of 
the  measurements  in  the  population  is  likely  to  take.   It  is 
then  necessary  to  rely  on  judgments  of  a  more  or  less  subjec- 
tive nature  carried  out  after  a  comparison  of  the  objects 
among  themselves.   One  method  of  comparison  which  has  been 
widely  used  is  that  of  ranking. 

Bradley  (1953)  purported  that  criticism  of  ranking  meth- 
ods stems  from  a  supposed  loss  of  efficiency.   When  quantita- 
tive judgments  can  be  obtained,  the  magnitude  of  differences 
is  obscured  by  the  use  of  ranks.   On  the  other  hand,  when 
treatment  differences  are  small  and  difficult  to  detect,  it 
would  appear  reasonable  to  simplify  the  procedure  for  the 
judge  and  use  a  ranking  technique.   The  rank  order  method  is 
usually  computationally  simple  and  often  preferred  on  this 
ground  alone. 

Kendall  and  Smith  (19  40)  investigated  a  method  of  prefer- 
ences where  n  objects  are  paired  {^)    and  an  observer  indicates 
preference  of  one  object  over  another.   They  measured 


reliability  of  judgments  on  the  part  of  the  observer  and 
concordance  of  preferences  between  observers.   In  this  treat- 
ment they  excluded  ties.   Another  comparison  technique 
stemmed  from  research  for  the  Army  demobilization  point  sys- 
tem.  This  study  by  Guttman  (19  46)  covered  not  only  ordinary 
comparisons  but  situation  comparisons  which  combined  several 
variates.   The  developments  excluded  judgments  of  equality 
and  assumed  that  all  people  compared  all  pairs.   White  (1952) 
presented  methods  and  developed  tables  for  determining  the 
significance  of  the  difference  between  two  treatments  in  a 
ranking  procedure.   This  procedure  required  quantitative 
values  which  are  ranked  and  then  summed. 

One  method  which  has  received  considerable  attention  is 
the  method  of  "paired  comparisons."   Bradley  and  Terry  (1952) 
have  developed  the  method  of  paired  comparison  for  the  rank 
analysis  of  incomplete  block  designs.   The  procedures  are 
applicable  where  qualitative  measurements  are  reliable  and 
useful  in  problems  involving  subjective  ranking  by  judges. 
No  provisions  were  made  for  ties  or  for  not  ranking  a  partic- 
ular pair  or  group  of  treatments. 

A  solution  of  the  ranking  problem  from  binary  comparisons 
developed  by  Ford  (195  7)  closely  paralleled  the  development 
by  Bradley  and  Terry.  This  procedure  is  singularly  important 
in  that  it  handles  problem  areas  not  provided  for  in  any  pre- 
ceding development.  Standard  procedures  usually  require  that 
the  number  of  comparisons  between  any  given  pair  be  equal  to 
the  number  between  any  other  pair.   Ford's  method  does  not 


require  any  specific  number  of  comparisons  between  pairs, 
and  it  allows  that  there  be  missing  data.   These  two  provi- 
sions permit  considerable  flexibility  among  judges  making 
difficult  comparisons. 

Ford  assumed  a  matrix  A=(a. .) ,  where  a. .  represented  the 
number  of  times  object  i  had  been  preferred  to  object  j. 
Ford  associated  with  each  object  a  weight  w. .   These  weights 
would  be  interpreted  as  odds,  in  the  sense  that  the  proba- 
bility of  i  being  preferred  to  j  in  a  future  comparison 
would  be  taken  to  be  w./iw.+w.).      With  these  probabilities, 
one  could  compute  the  a  priori  probability  of  obtaining  pre- 
cisely the  matrix  of  results  obtained,  that  is,  the  matrix  A. 

In  order  to  determine  the  set  of  weights  which  maximized 
the  likelihood  of  obtaining  matrix  A,  Ford  solved,  by  aii 
iterative  technique,  the  following  equation  for  each  object 
until  the  weight  stabilized. 
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where  a.  .=  number  of  times  object  i  was  preferred  to  object 
j;  a..=  number  of  times  object  j  was  preferred  to  object  i; 
and,  w.=  weight  assigned  to  object  i  on  the  n    iteration. 
Ford  made  the  following  assumption  about  matrix  A:   "In  every 
possible  partition  of  the  objects  into  two  nonempty  subsets, 
some  objects  in  the  second  set  has  been  preferred  at  least 
once  to  some  object  in  the  first  set."   In  order  to  yield  a 
solution  the  data  must  meet  this  criterion. 


In  the  early  1960 's  in  an  effort  to  determine  a  rank- 
ordering  of  measures  of  scientific  performance,  Pelz  and 
Andrews  (1966)  developed  a  computer  program  which  embodied 
the  Ford  procedure.   In  order  to  satisfy  the  partitioning 
assumption  by  Ford,  the  program  incorporated  a  means  for  sep- 
arating universal  highs  and  universal  lows  prior  to  computa- 
tion of  weights.   The  other  ways  in  which  the  assumption 
could  be  violated  were  if  some  objects  were  not  rated  rela- 
tive to  other  objects  or  if  some  objects  would  fall  in  a 
subset  such  that  comparisons  were  all  in  one  direction.   Ad- 
dition of  a  small  constant  to  each  cell  of  the  matrix  A  solved 
the  last  two  violations.   The  computer  program  developed  did 
not  provide  a  means  to  maintain  the  identity  of  each  judge  nor 
examine  the  consistency  of  the  judges  with  one  another. 

The  purpose  of  this  study  was  three-fold: 

1.  Adaptation  of  the  present  computer  program  for  use  on  the 
IBM  360/6  7  computer. 

2.  Statistical  validation  of  the  program  and  model. 

3.  To  indicate  the  implications  of  adapting  such  a  program 
for  use  in  the  Navy. 

The  assumption  was  made  a  priori  that  the  capability  of 
judges  was  uniform  throughout  the  experiment. 


II.   METHOD 

A.   TEST  CATEGORIES 

Proper  validation  of  the  program  required  testing  and 
a  comparison  of  the  program  results  with  a  known  true  order 
of  items  or  a  universally  accepted  standard.   Four  test 
categories  of  verbal  items  were  selected  in  which  the  items 
listed  were  highly  familiar  to  all  subjects  tested.   These 
categories  contained  items  which  had  at  least  a  .9  correla- 
tion over  test  subjects  in  the  category  norms  for  verbal 
items  compiled  by  Battig  and  Montague  (196  9) . 

In  order  to  insure  that  there  would  be  ties  and  missing 
data  in  the  testing,  the  following  criterion  was  used  for 
item  selection  within  each  category.   First,  items  were 
grouped  together  into  approximately  four  groups.   The  criter- 
ion between  items  in  each  group  was  a  one  to  three  percent 
change  in  frequency  of  occurrence  based  on  the  Thorndike  and 
Lorge  (19  44)  general  count.   The  second  criterion  was  a  five 
to  ten  percent  change  in  the  frequency  of  occurrence  between 
groupings.   These  criteria  yielded  essentially  a  type  of 
clustering  in  four  ranges  of  frequency  of  occurrence  of 
verbal  items . 

To  compare  things  two  at  a  time  and  judge  which  has 
higher  rank  or  to  rank  all  n  things  simultaneously,  that  is 
judging  n(n-l)/2  comparisons  at  once,  are  substantially  equiv- 
alent procedures.   However,  comparing  two  things  at  a  time 
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allows  inconsistencies  (intransitivity)  to  appear  within 
judgments  of  individuals,  and  it  is  sometimes  harder  in 
practice  for  people  to  judge  n  things  simultaneously  than 
to  compare  them  two  at  a  time.   In  order  to  eliminate  these 
two  problems,  items  were  not  presented  in  pairs.   Rather,  the 
entire  category  was  included  on  one  testing  sheet  whereby 
each  individual  could  see  all  items,  essentially  combining 
the  two  methods.   Appendix  B  shows  the  categories  and  items 
in  each  category  in  order  of  their  frequency  of  occurrence. 

B.  SUBJECTS 

Twenty  male  and  female  subjects  ranging  in  age  from  24 
to  37  years,  with  comparable  levels  of  education,  were  se- 
lected.  Each  subject  was  used  twice.   Ten  subjects  were 
assigned  at  random  to  each  of  four  test  categories, 

C.  PROCEDURE 

Subjects  were  given  a  standard  set  of  instructions  ex- 
plaining the  ranking  procedure  to  be  used  (Appendix  A) .   Sub- 
jects were  required  to  work  rapidly  and  to  give  their  first 
impression  as  to  assignment.   The  items  were  placed  on  the 
test  sheet  in  a  random  order.   The  subject's  replies  were 
recorded  on  the  testing  sheet. 

D.  DESIGN 

Compiled  data  was  input  to  the  program  and  an  overall 
ranking  obtained.   The  method  of  evaluations  used,  when  not 
all  items  are  compared  by  all  judges,  indicates  a  sign  test 
or  a  signed-rank  test  should  be  used  [Abelson  and  Bradley 
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1954].   The  use  of  Analysis  of  Variance  was  not  appropriate. 
In  the  formation  of  subjective  tests  the  assumptions  of  Anal- 
ysis of  Variance  are  seriously  suspect  [Bradley  1955] .   Anal- 
ysis of  Variance  also  requires  quantifiable  data,  not 
qualitative  data.   Dixon  (195  3)  has  shown  that  the  sign  test 
compares  more  favorably  with  Analysis  of  Variance  for  small 
sairples  than  indicated  by  results  on  relative  efficiency.   In 
investigating  power  of  paired  comparisons,  it  was  found  that 
the  efficiency  of  the  method  of  paired  comparisons  relative 
to  Analysis  of  Variance,  and  under  conditions  appropriate  to 
Analysis  of  Variance,  was  t/irCt-l)  where  t  is  the  treatment 
being  considered.   When  t=2  the  efficiency  reduces  to  2/7r, 
the  relative  efficiency  of  the  sign  test. 

Data  results  were  amenable  to  a  Wilcoxon  signed  rank  test 
with  a  hypothesis  that  the  treatments  are  equal.   A  Spearman 
rank  correlation  was  conducted  and  a  hypothesis  that  rho=o 
was  tested.   The  Kendall  coefficient  of  concordance  was  not 
used  but  is  similar  to  the  Spearman  rank  order  correlation 
using  a  hypothesis  that  tau=o.   All  tests  were  done  at  a  sig- 
nificance level  of  .05. 

E.   COMPUTER  PROGRAM 

The  computer  program,  performed  an  overall  rank-ordering 
of  partially  ordered  data.   The  program  is  set  up  in  three 
parts;  the  main  program  and  two  subroutines. 

1.   Main  Program 

The  main  program  performed  essentially  two  functions 
after  reading  all  the  input  data.   Beginning  with  the  first 
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judge,  a  sequential  ID  number  was  assigned  to  the  original 
ID  Tiurnber  of  objects  that  were  judged  in  order  of  their  ap- 
pearance.  The  procedure  continued  until  all  objects  were 
accountable.   No  duplication  of  assignments  were  made.   As- 
signed ID  numbers  were  used  throughout  the  program  and  the 
original  ID  numbers  stored  until  the  final  printout  of 
weights . 

Beginning  with  each  judge  a  count  was  made  of  the 
number  of  comparisons  which  were  to  be  made  between  each  pair 
of  objects.   No  comparisons  were  made  between  objects  tied, 
that  is,  objects  assigned  to  the  same  rank. 
2.   Subroutine-  C0RE2 

The  first  section  tabulates  for  each  individual  com- 
parison the  nuiriber  of  times  that  comparison  was  made  by  all 
judges  in  the  experiment.   It  was  done  sequentially  from  the 
input  ranking-order  of  the  judges  and  was  the  numJDer  of  times 
object  i  was  preferred  to  object  j,  that  is,  the  win-loss 
matrix. 

The  next  section  through  a  series  of  logic  sv;itches 
determined  which  objects  were  rated  "universal  highs"  or 
"universal  lows"  and  removed  them  from  the  weight  calculation 
The  appropriate  rows  and  columns  of  the  win-loss  matrix  were 
also  removed. 

The  next  section  computed  the  initial  weighting  fac- 
tor from  the  win-loss  matrix.   To  each  cell  in  the  matrix  a 
small  constant  was  added,  .00001,  to  prevent  violation  of 
Ford's  partition  assumption. 
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3.   Subroutine-  C0RE3 

In  this  subroutine  the  final  calculations  were  made 
for  the  new  weighting  factor  and  then  the  new  weighting  fac- 
tor was  compared  to  the  old  factor  to  determine  if  the  preset 
convergence  criterion  was  met.   (No  objects  weighting  factor 
changed  more  than  .005  between  iterations.)   If  this  criter- 
ion was  not  met  then  a  count  of  the  number  of  iterations  was 
made  against  the  number  input  in  order  to  determine  whether 
the  program  would  terminate  without  convergence. 
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III.   RESULTS 


The  experimental  results  are  compiled  in  Appendix  C. 
These  results  were  studied  in  an  attempt  to  pinpoint  any 
significant  differences  between  the  experimental  results  and 
the  accepted  standard  of  Thomdike  and  Lorge.   It  should  be 
remembered  that  these  results  pertain  to  the  specific  type 
of  testing  used.   The  results  did  indicate  some  obvious  sim- 
ilarities between  the  methods  and  permit  some  fairly  general 
conclusions . 

Three  means  are  available  to  analyze  ranked  data;  the 
Wilcoxon  signed  rank  test,  the  Spearman  rank  correlation,  and 
the  Kendall  Coefficient  of  Concordance.   All  three  tests 
measure  essentially  the  same  relationships  between  the  sets 
of  data,  however,  the  method  is  different. 

The  signed  rank  test  takes  into  account  the  magnitude  of 
the  observed  differences  between  the  data  sets.   The  hypoth- 
esis tested  at  the  .05  significance  level  was  that  there  was 
no  difference  between  the  effects  of  the  two  treatments. 
This  hypothesis  was  accepted  for  all  four  categories. 

For  ease  of  computations  the  Spearman  rank  order  correla- 
tion was  used  in  lieu  of  the  Kendall  coefficient  of  concord- 
ance.  However,  the  same  conclusion  would  be  reached,  namely 
to  accept  or  reject  the  null  hypothesis,  by  computing  the 
Kendall  coefficient  of  concordance  [Ostel  1963] .   The  Spear- 
man rank  order  correlation  yielded  the  following  for  each 
category  tested: 
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Category    rho 


I 

.521 

II 

.598 

III 

.687 

IV 

.460 

The  hypothesis  tested  was  rho=o.   In  Category  II  and 
Category  III  the  hypothesis  was  rejected  at  the  .05  level 
when  a  two  tailed  t  test  was  used.   In  Category  I  and  Cate- 
gory IV  the  computed  t  value  indicated  the  failure  to  reject 
a  hypothesis  of  rho=o.   This  can  be  explained,  however,  by 
looking- at  the  rankings  of  the  program  and  the  Thorndike  and 
Lorge  count.   In  Category  I  the  computed  difference  in  score 
between  only  one  pair  was  excessive.   This  was  for  the  mate- 
rial "felt."   In  view  of  the  present  day  use  of  synthetic 
materials  it  is  believed  that  the  familiarity  with  felt  mate- 
rial is  quite  low  compared  to  what  it  was  26  years  ago.   In 
Category  IV  there  were  two  diseases,  typhoid  and  syphilis, 
which  were  exactly  reversed  in  the  program  ranking.   This  re- 
versal yielded  a  large  difference  in  score  between  each  pair. 
Today's  cleanliness  standards  and  medical  developments  have 
lessened  the  familiarity  with  typhoid  which  would  place  it 
low  in  a  current  ranking.   Due  to  the  fact  that  the  majority 
of  the  test  subjects  were  military  personnel  with  a  broad 
background,  a  greater  familiarity  with  syphilis  and  a  tendency 
to  place  this  item  high  on  any  ranking  list  could  be  expected. 
Recomputation  of  the  correlation  coefficient  for  Category 
I  with  "felt"  removed,  yielded  a  rho=  .788.   In  this  case  the 
t  statistic  indicated  rejection  of  the  hypothesis  that  rho=o. 
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Recomputation   of   the    correlation    coefficient    for   Category 
IV  with  either    "typhoid"    or    "syphilis"    removed   or   reducing 
the    difference    in    score   between   each    of   the    two  pairs    to 
one-half   of   its   present  value   yielded   a   rho=    .585.      The    t 
statistic   indicated   rejection   of   the   hypothesis    that    rho=o. 
Rank-order   stability  was    reached   after   the    first    itera- 
tion   for   Category    I,    III,    and   IV.       Stability   was    reached   for 
Category    II    after   the    third   iteration.       Category    I    converged 
in   thirty-five    iterations    and   Category   III    converged   in    six- 
teen  iterations.      No   convergence   was    reached    for   Category    II 
and   Category    IV   after    fifty   iterations.       Figures    1,    2,    3, 
and    4    show  how  weights    of  each  object   changed   over    fifty 
iterations    for  each   category.      Four   objects    in    Category    III 
were    rated   as    universal  highs    and  were    removed  prior   to   com.- 
putation   of  weights. 


17 


c 

g 

o 

c 

c 

+J 

Q> 

o 

(0 

D> 

•H 

H 

U 

M 

+J 

0) 

0) 

(1) 

fd 

■p 

> 

-P 

•H 

c 

-H 

-u 

o 

>-J 

0) 

in 

u 

u 

e 

ro 

>1 

u 

0 

^ 

(U 

U) 

-1-1    • 

7^ 

fd  T! 

fN 

o 

U    (U 

m 

H 

w 

R 

D 

s 

c  u 

W 

O    0) 

H 
M 

•H  ^ 
-M    6 

(d  p 

•^ 

0) 

CN 

4J  Q 
M   H 

CD    QJ 

«X) 

c  w 

1— 1 

(d  < 

U 

JJ 

x: 

en 

•H 
(U 
[2 

18 


o 

■^ 

>1 

>^ 
0 

0 

• 

+J  Ti 

(t; 

0) 

en  U 

m 

2: 

D 

oj  0 

fO    H 

. 

5^ 

H 

c 

(U 

g 

0 

•rH 

M 

4J 

^ 

B 

fU 

2; 

H 

in 

(U  Q 

-p 

H 

M 

T3 

M 

0) 

0) 

d 

a.  tn 

-H 

0) 

to 

tr> 

CO 

c 

< 

(13 

u 

5 

4J 

x: 

en 

00 

•H 

CJ) 

Ls 

19 


c 

Q) 

C) 

0 

•H 

c  c 

-p 

Q)    0 

ra 

Cn-H 

M 

U    U 

-p  (I) 

Q)    (U 

fC  4-> 

>  +J 

•H 

C  -H 

4J 

0    U 

0)   MD 

u  u 

e  -1 

■p 


V^ 
O 
Cn    . 

-P    0) 

fO  en 
1  U  D 

!       u 

I      •    (U 

1.21 

(t 

S-)  Q 
dJ  H 
-P 
H  13 

U    ^ 
0)    & 

CO 
(D  to 
tJxC 
C 

fd 
,a 
u 

•p 

■H 

(U 
5 


20 


r-     iH         vp 


>i 

M 

0 

D^ 

0) 

-(-)  • 

fO  T3 

cn 

U  OJ 

CM 

"2 

U) 

m 

o 

D 

M 

• 

Ej 

C  S-i 

< 

O  0) 

D:: 

■;j-g 

K 

H 

to  S 

H 

■^ 

-P  Q 

(N 

H  H 

S-l  T! 
0)  0) 
Cu  C 

0)  -H 

vn 

rd  < 

rH 

u 

•H 

1 

-P 

"7" 

x: 

LO 

in 

in 

ID 

<r> 

tj^ 

CM 

<N 

CN 

■^ 

o 

•H 

VD 

iH 

CM 

O 

o 

(U 

• 

• 

• 

& 

in 

iH 

21 


IV.   DISCUSSION 


Evaluation,  of  the  results  by  both  the  Wilcoxon  signed 
rank  test  and  the  Spearman  rank  order  correlation  indicated 
that  the  program  produced  the  proper  rank  order  for  the 
items  tested.   The  results  obtained  were  statistically  valid 
and  appear  consistent  with  real  world  data.   The  results  can 
only  be  considered  approximately  correct  for  the  categories 
examined,  since  they  were  subject  to  variations  in  the  rank- 
ing by  judges.   One  source  of  variation  occurs  when  the 
quality  being  measured  is  not  known  with  certainty  to  be  rep- 
resented as  a  linear  variable.   An  observer  may  rank  a  number 
of  objects  on  this  quality  believing  that  he  is  doing  some- 
thing within  his  powers.   However,  if  this  quality  is  not 
measurable  on  a  linear  scale,  the  ranking  may  fail  to  give  a 
real  picture  either  of  the  observer's  preference  or  of  the 
variation  of  the  quality  among  the  objects.   Another  source 
of  variation  is  when  an  observer  produces  a  configuration  of 
preferences  which  show  inconsistencies.   There  are  usually 
several  explanations;  he  may  be  an  incompetent  judge,  the 
objects  may  be  so  alike  that  consistent  differentiation  is 
not  possible,  or  his  attention  may  wander  during  the  course 
of  the  experiment.   The  second  source  of  variation  was  not 
considered  in  the  experiment  since  it  was  assumed  a  priori 
that  the  standards  of  judging  were  uniform  throughout  the 
subjects  being  tested.   The  first  means  of  variation  was  not 
a  factor  due  to  the  way  in  which  the  items  were  selected. 
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The    correlations    obtained  were   not  expected  to  be    very 
high   due    to   the   method   of    selecting   items    to   insure    ties    and 
missing   data.      This   procedure    for   item   selection   provided 
a   check   of   the   program's    method   of    ranking.      Another   reason 
for   somewhat    lower   correlations   was   that  the    reference   mate- 
rial used    for   comparison    [Thomdike    and   Lorge    19  44]  ,    is 
somewhat  outdated. 
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V.   APPLICATIONS  FOR  USE  OF  PROGRAM  IN  U.S.  NAVY 

There  are  many  instances  in  the  Navy  where  the  usefulness 
of  such  a  program  may  prove  invaluable  as  a  labor  saving 
device.   Ranking  procedures  are  used  throughout  the  Navy  in 
various  forms.   Although  none  of  the  systems  in  use  have 
been  examined  in  order  to  determine  their  relative  efficiency, 
each  system  requires  a  large  amount  of  manhours,  and  the  re- 
sults may  still  be  biased  by  many  factors  unknown  to  the 
individual  or  agency  assembling  the  overall  ranking. 

A.   SPECIFIC  USES 

Annually  the  Navy  has  proposed  to  it  or  makes  proposals 
for  various  research  programs.   The  amount  of  money  spent 
and  the  number  of  feasibility  studies  undertaken  in  order  to 
determine  which  programs  should  have  priority  for  development 
and  which  ones  should  be  discarded  is  not  known.   It  is  easy 
to  imagine  how  a  ranking  program  might  be  used  to  determine 
which  proposals  should  be  put  into  committees  for  further 
study  and  evaluation. 

Many  times  each  year  military  officers  are  available  for 
assignment  to  new  billets  and  changes  of  duty  stations.   In 
order  to  determine  proper  assignment,  a  ranking  procedure  is 
used  taking  into  account  an  officer's  performance  based  on 
fitness  reports,  his  desires  from  a  preference  card,  and 
several  other  factors.   The  computer  program  cited  would  fa- 
cilitate a  large  reduction  in  manhours  spent  tabulating  this 
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data  and  ranking  those  officers  within  the  group  for  assign- 
ment.  A  modification  to  the  existing  program  would  be  re- 
quired in  order  to  give  more  weight  to  certain  factors  and 
to  provide  a  means  of  weighting  the  competence  of  certain 
j  udge  s . 

When  military  personnel  are  transferred  there  are  many 
questionnaires  which  are  filled  out  rating  the  supply  facil- 
ity which  handled  the  movement  of  their  household  goods  and 
the  shipping  firm  which  did  the  actual  moving.   A  ranking 
procedure  in  this  case  would  point  out  which  facilities  are 
doing  a  good  job  and  which  ones  are  substandard. 

The  preceding  paragraphs  have  pointed  up  three  of  many 
uses  for  which  a  computer  ranking  procedure  may  be  used. 

B.   AN  EXAMPLE 

The  Naval  Personnel  and  Training  Research  Laboratory  at 
San  Diego  and  the  Naval  Personnel  Research  and  Development 
Laboratory  at  Washington,  D.C.  annually  publish  documents 
which  describe  programs  which  have  been  developed  or  have 
been  contracted  for  by  the  Navy.   From  these  documents  10 
titles  were  randomly  selected  and  abstracts  prepared  describ- 
ing the  programs  selected.   Ten  Naval  officers  from  the  Naval 
Postgraduate  School  were  asked  to  rank  these  programs  on  their 
desirability  and  need  for  retention  and  further  development 
within  the  Navy.   Subjects  were  given  a  standard  set  of 
instructions  to  be  used  in  the  ranking  procedure  (Appendix  D) . 
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VI.   CONCLUSIONS 

The  computer  program  cited  provided  a  simple  and  easy 
means  for  combining  sets  of  partially  ordered  data.   The 
program  produced  a  rank-order  which  appeared  consistent  with 
the  real  world  order  of  the  objects  that  were  ranked.   Tests 
indicated  that  there  was  no  significant  statistical  differ- 
ence between  the  rank-order  produced  by  the  program  and  the 
true  order.   This  result  must  remain  somewhat  tentative  in 
view  of  the  fact  that  more  extensive  experimentation  was  not 
conducted.   The  assumption  that  there  were  uniform  standards 
of  judging  throughout  the  subjects  being  tested  was  logical  in 
view  of  the  test  results. 

A  ranking  program  of  this  nature  would  be  valuable  for  use 
in  the  military.   It  could  be  used  to  replace  or  supplement 
existing  ranking  procedures  which  are  now  used.   The  simplicity 
of  such  a  program  would  yield  reductions  in  manhours  and  costs 
of  most  systems  now  in  operation.   In  some  situations  modifica- 
tions to  the  program  would  be  required  to  include  a  provision 
for  giving  additional  weight  to  certain  factors  which  would 
be  more  important  in  the  ranking  procedure. 
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APPENDIX  A 

INSTRUCTIONS  TO  JUDGES -TESTING  PHASE 

The  purpose  of  this  experiment  is  to  achieve  an  ordinal 
ranking  of  similar  objects  which  belong  to  different  catego- 
ries.  Each  category  will  contain  a  list  of  twelve  objects 
belonging  to  that  category.   For  each  category,  you  are  to 
make  an  ordinal  ranking  of  the  objects  in  that  category  as 
to  what  you  believe  their  relative  familiarity  is  to  all 
people  in  general.   You- are  requested  to  judge  only  those 
objects  which  you  can  rank  with  confidence.   You  are  per- 
mitted to  use  as  many  ordinal  ranks  for  each  category  as  you 
deem  necessary,  and  to  place  as  many  objects  in  each  rank  as 
you  choose.   In  order  to  simplify  the  procedure,  after  look- 
ing at  the  words  in  each  category,  select  the  number  of 
ordinal  ranks  which  you  will  use.   Write  the  number  of  the 
rank  next  to  the  object  you  are  assigning  to  that  rank  for 
the  objects  you  choose  to  judge.   Work  as  rapidly  as  possible 
and  give  your  first  impression  as  to  assignment. 

Are  there  any  questions? 


27 


APPENDIX   B 

ACCEPTED    STANDARD    RANK    ORDERING 
OF    ITEMS    BY    CATEGORY 


CATEGORY  I 

(4) 

cotton 

(5) 

felt 

(12) 

wool 

(2) 

lace 

(9) 

velvet 

(6) 

canvas 

(8) 

muslin 

(3) 

pique ' 

(10) 

rayon 

(11) 

corduroy 

(7) 

denim 

(1) 

batiste 

CATEGORY    II 


CATEGORY  III 


(8) 

salt 

(6) 

sugar 

(9) 

sage 

(10) 

ginger 

(11) 

vinegar 

(12) 

cloves 

(5) 

mustard 

(2) 

cinnamon 

(4) 

nutmeg 

(3) 

thyme 

(7) 

basil 

(1) 

cayenne 

(10) 

cup 

(11) 

bowl 

(4) 

knife 

(12) 

fork 

(9) 

refrigerator 

(5) 

saucer 

(3) 

sieve 

(6) 

skillet 

(8) 

ladle 

(2) 

scraper 

(7) 

toaster 

(1) 

cleaver 

CATEGORY  IV 

(7)  cold 

(9)  rheumatism 

(4)  typhoid 

(1)  cancer 

(10)  smallpox 

(11)  cholera 
(6)  measles 

(2)  rheumatic    fever 

(5)  syphilis 

(8)  diabetes 

(12)  dysentery 

(3)  peritonitis 


ID  numbers  assigned  to  items  in  each  category  are 
in  parenthesis. 


28 


APPENDIX  C 

COMPUTER  RANK  ORDERING 
OF  ITEMS  BY  CATEGORY 


CATEGORY  I 

(4) 

cotton 

(12) 

wool 

(2) 

lace 

(6) 

canvas 

(11) 

corduroy- 

(9) 

velvet 

(7) 

denim 

(10) 

rayon 

(5) 

feit 

(3) 

pique ' 

(8) 

muslin 

(1) 

batiste 

CATEGORY  II 


CATEGORY    III 


(8) 

salt 

(6) 

sugar 

(5) 

mustard 

(11) 

vinegar 

(12) 

cloves 

(2) 

cinnamon 

(4) 

nutmeg 

(10) 

ginger 

(9) 

sage 

(7) 

basil 

(1) 

cayenne 

(3) 

thyme 

(12) 

fork 

(5) 

saucer 

(4) 

knife 

(10) 

cup 

(11) 

bowl 

(6) 

skillet 

(7) 

toaster 

(9) 

refrigerator 

(1) 

cleaver 

(8) 

ladle 

(2) 

scraper 

(3) 

sieve 

( 

: ATE GORY  IV 

(7)  cold 

( 1 )  can  ce  r 

( 6 )  me  as  le  s 

(5)  syphilis 

(8)  diabetes 

(9)  rheumatism 

(10)  smallpox 

(11)  cholera 
(4)  typhoid 

(12)  dysentery 

(2)  rheumatic    fever 

(3)  peritonitis 


ID  numbers  assigned  to  items  in  each  category  are 
in  parenthesis. 
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APPENDIX  D 

INSTRUCTIONS  TO  JUDGES 
APPLICATION  EXAMPLE 

The  purpose  of  this  experiment  is  to  achieve  an  ordinal 
ranking  of  the  desirability  of  development,  maintaining, 
and/or  continuing  certain  research  programs  within  the  U.S. 
Navy.   You  will  be  given  a  description  of  ten  research 
programs  currently  in  use  or  being  proposed  by  the  various 
research  laboratories  in  the  Navy.   You  are  to  make  an  ordi- 
nal ranking  of  the  programs  as  to  what  you  believe  their 
desirability  and  need  are  for  retention  and  further  develop- 
ment within  the  Navy.   You  are  requested  Lo  judge  those  areas 
with  which  you  feel  you  can  rank  with  confidence.   You  are 
permitted  to  use  as  many  ranking  categories  as  you  deem 
necessary,  and  to  place  as  many  programs  in  each  category  as 
you  choose.   In  order  to  simplify  the  procedure,  after  review- 
ing the  programs,  select  the  number  of  ranking  categories 
which  you  will  use.   Write  the  number  of  the  category  next  to 
the  program  you  are  assigning  to  that  category  for  the  programs 
you  choose  to  judge.   Work  rapidly  and  give  your  first  impres- 
sion as  to  assignment. 

Are  there  any  questions? 
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APPENDIX  E 

COMPUTER  RANK  ORDERING  OF  RESEARCH  PROJECTS 
APPLICATION  EXAMPLE 

105  TITLE;   Improved  Enlisted  Personnel  Distribution  and 
Management. 

DESCRIPTION;   A  computer  assisted  distribution  and  assign- 
ment (CADA)  system  is  being  designed  to  help  improve  the 
utilization  of  enlisted  manpower.   Preliminary  model  cur- 
rently is  being  implemented  in  the  Pacific  Fleet.   Proto- 
type model  is  now  under  development  for  application  in 
SUPERS  in  support  of  centralized  management  of  enlisted 
ratings.   Related  research  results  include  development  of 
computer  and  mathematically  based  procedures  for  (1)  the 
equitable  allocation  of  personnel  resources,  (2)  the 
optimal  match  of  man  and  billet,  (3)  the  identification  of 
billet  vacancies  in  order  of  priority,  (4)  the  projection 
of  the  number  of  distributable  assets,  and  (5)  the  feed- 
back of  information  on  the  results  of  distribution 
management  actions . 

101  TITLE:   Ship  Manning  Requirements  Techniques 

DESCRIPTION:   The  increasing  sophistification  and  com- 
plexity of  naval  ships,  systems,  and  equipments  in  the 
face  of  project  volunteer  and  a  smaller  Navy  requires 
the  development  of  methods  which  will  improve  the  accuracy 
of  manpower  requirements  forecasting  and  manpower 

utilization. 
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A  technique  for  defining  and  documenting  manpower 
requirements  for  ships  based  on  the  application  of  se- 
lected work  study  techniques  to  basic  manning  criteria  in 
each  of  the  separate  work  areas  aboard  ship  has  been 
developed.   It  permits  the  production  of  a  document  which 
displays  in  detail  the  rationale  for  manning  by  ship 
classes  based  on  equipment  and  required  operational  capa- 
bilities to  meet  mission  assignment. 

104  TITLE:   Evaluation  of  Standards  for  Navy  Reenlistment. 

DESCRIPTION:   This  research  was  generated  out  of  concern 
over  the  quality  of  reenlistees.   Unsatisfactory  perform- 
ance was  costing  the  military  services  enormous  amounts 
of  money  in  such  things  as  reenlistment  bonuses  and  pay 
and  allowances  for  reenlistees  from  whom  commensurate 
service  was  not  realized.   Court  and  confinement  costs  of 
reenlistees  were  cited.   It  was  suspected  that  personnel 
of  inferior  quality  were  being  allowed  to  reenlist,  in- 
cluding some  with  unsatisfactory  first  term  records. 

In  an  attempt  to  identify  unsatisfactory  individuals 
prior  to  reenlistment,  comparisons  were  made  between  un- 
satisfactory and  satisfactory  reenlistees  on  information 
available  at  the  time  of  the  reenlistment  decision.  The 
project  also  provided  information  on  the  effect  on  manning 
which  would  result  if  reenlistment  standards  were  m.ade 
more  stringent. 
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102  TITLE:   Development  of  Navy  Military  Personnel  Costing 
Techniques  for  Use  in  Determining  Cost  Implica- 
tions Associated  with  Changes  in  Reenlistment 
Rates. 

DESCRIPTION:   Thousands  of  skilled  technicians  are  re- 
quired to  operate  and  maintain  the  complex  systems  and 
equipment  now  in  the  Fleet.   The  Navy  constantly  experi- 
ences difficulty  in  retaining  these  technicians  because 
of  competition  for  them  from  other  sectors  of  the 
economy. 

To  alleviate  this  problem,  several  technician-oriented 
procurement  programs  and  career  incentive  programs  are 
employed.   To  facilitate  evaluation  of  these  programs,  a 
methodology  for  determining  the  relative  cost  benefits 
associated  with  retention  of  personnel  has  been  developed. 

10  3  TITLE:   Design  of  an  Optimum  Personnel  Force  Structure. 
DESCRIPTION:   An  optimum  force  structure  containing  ap- 
propriately qualified  personnel  in  sufficient  numbers  at 
least  cost  cannot  now  be  certified.   This  project  is  con- 
cerned with  the  development  of  improved  techniques  to 
analyze  and  balance  the  relationship  between  personnel 
requirements  and  the  composition  of  the  existing  force 
structure. 

106  TITLE:   Interest  Measurement  in  Officer  Selection. 

DESCRIPTION:   Each  year  several  thousand  young  men  apply 
for  officer  training  programs  at  the  Naval  Academy  and 
NROTC  units  at  various  colleges.   High  attrition  rates 
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are  experienced  in  both  training  and  active  duty.   To 
reduce  the  cost  of  losing  substantial  proportions  of 
these  men,  it  is  imperative  that  those  applicants  having 
the  greatest  career  potential  be  identified  in  the  selec- 
tion process.   Several  years  of  research  on  vocational 
interest  tests  and  biographical  questionnaires  have  made 
it  possible  to  identify  those  applicants  most  likely  to 
successfully  complete  officer  training  and  remain  in  the 
Navy  after  completing  their  minimum  requirements. 

110  TITLE:   Evaluation  Survey  of  the  Effectiveness  of  Sub- 
marine Sonar  Operator  Training. 

DESCRIPTION:   A  comprehensive  survey  was  accomplished  of 
the  proficiency,  training,  and  utilization  of  submarine 
sonar  t'^chnicians  and  ccnar  v7at^l^S'*~and'^^s    '^he  su**^/'^^^ 
provided  up-to-date  information  concerning  the  efficiency 
of  training  procedures.   Such  information  is  necessary  on 
a  periodic  basis  to  insure  appropriate  alignment  of  the 
training  to  fleet  requirements  in  order  to  prevent  seri- 
ous impairment  of  operational  fleet  submarine ASW 
efficiency.   Data  gathering  instruments  included  interview 
forms,  self  ratings,  supervisor  ratings,  knowledge  tests, 
and  performance  tests. 

10  7  TITLE:   Marginal  Personnel/Minority  Group  Testing. 

DESCRIPTION:   Present  test  batteries  used  in  both  military 
and  civilian  settings  have  been  criticized  for  alleged 
inequities  when  used  with  groups  defined  on  the  basis  of 
race  or  ethnic  affiliation.   Public  policy  as  well  as 
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efficient  manpower  utilization  requires  that  all  personnel 
be  afforded  equality  of  opportunity  in  assignment  and  that 
those  abilities  being  measured  bear  relevance  to  skills 
required  on-the-job. 

109  TITLE:   Personnel  Cost  Research  for  Early  Man/Machine 
Design  Trade-Offs. 

DESCRIPTION:   The  critical  element  of  personnel  cost  has 
not  been  systematically  considered  when  making  system 
design  and  development  decisions  early  in  the  system  de- 
velopment cycle.   No  tools  exist  to  enable  the  cost- 
effectiveness  of  such  decisions  to  be  measured.   For  this 
reason,  research  was  undertaken  to  develop  a  personnel 
cost  model  for  use  in  personnel  and  man-equipment  trade  off 
decisions.   A  basis  model  was  accomplished  which  allowed 
the  identification  of  all  pertinent  cost  items  and  the 
accumulation  of  cost  elements  in  an  unequivocal  manner. 

10  8  TITLE:   LOFARGRAM  Analysis  Procedures. 

DESCRIPTION:   The  airborn  JEZEBEL  system  has  shown  great 
potential  as  a  means  of  detecting  and  classifying  under- 
water contacts;  however,  its  usefulness  has  been  continu- 
ally hanpered  by  the  lack  of  adequately  trained  operators. 
One  of  the  main  reasons  for  operator  deficiencies  is  that 
training  programs  have  been  seriously  hampered  by  the  lack 
of  a  standardized,  systemic  procedure  for  analyzing  the 
information  displayed  on  the  gram  which  is  the  main  display 
component  of  the  system. 
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In  order  to  correct  this  situation,  a  systematic 
LOFARGRAM  procedure  was  developed. 


ID  numbers  assigned  to  abstracts  are  to  the  left  of  the 
title. 
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APPENDIX  F 


EVALUATIONS  OF  ITEMS  PER  CATEGORY  BY  TEN  JUDGES 


CATEGORY  I 

Evaluations  by 

Judges 

True 

Order 

Object 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

X 

1 

4 

1 

1 

1 

1 

1 

1 

1 

3 

2 

1 

2 

5 

4 

3 

1 

2 

7 

2 

4 

7 

2 

3 

12 

2 

2 

1 

1 

2 

1 

1 

1 

1 

4 

2 

5 

2 

1 

2 

4 

2 

4 

2 

1 

5 

9 

6 

3 

1 

2 

5 

1 

3 

9 

1 

6 

6 

3 

2 

1 

3 

3 

1 

2 

5 

] 

7 

8 

10 

4 

2 

5 

6 

2 

5 

8 

3 

8 

3 

3 

8 

3 

11 

3 

9 

10 

7 

3 

2 

4 

5 

] 

2 

10 

2 

10 

11 

8 

3 

1 

3 

4 

1 

3 

6 

1 

11 

7 

3 

2 

5 

5 

o 

o 

o 

2 

12 

1 

3 

8 

3 

12 

2 

3 

CATEGORY  II 
Evaluations  by  Judges 


True 

Order   Object   I   II   III   IV  V  VI   VII   VIII   IX 


10  3         11  12 

11  112  12 


4 

12 

2 

1 

1 

1 

1 

1 

1 

1 

4 

1 

5 

9 

5 

2 

2 

1 

3 

1 

4 

2 

5 

1 

6 

5 

4 

1 

1 

1 

5 

1 

2 

3 

4 

1 

7 

3 
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APPENDIX  G 


EVALUATION  OF  ABSTRACTS  BY  TEN  JUDGES 
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